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Abstract 

This paper addresses model dimensionality reduction for Bayesian inference based on prior 
Gaussian fields with uncertainty in the covariance function hyper-parameters. The dimen¬ 
sionality reduction is traditionally achieved using the Karhnnen-Loeve expansion of a prior 
Gaussian process assuming covariance function with fixed hyper-parameters, despite the fact 
that these are uncertain in nature. The posterior distribution of the Karhnnen-Loeve coor¬ 
dinates is then inferred using available observations. The resulting inferred field is therefore 
dependent on the assumed hyper-parameters. Here, we seek to efficiently estimate both the 
field and covariance hyper-parameters using Bayesian inference. To this end, a generalized 
Karhnnen-Loeve expansion is derived using a coordinate transformation to account for the 
dependence with respect to the covariance hyper-parameters. Polynomial Ghaos expansions 
are employed for the acceleration of the Bayesian inference using similar coordinate transfor¬ 
mations, enabling ns to avoid expanding explicitly the solution dependence on the uncertain 
hyper-parameters. We demonstrate the feasibility of the proposed method on a transient 
diffusion equation by inferring spatially-varying log-diffnsivity fields from noisy data. The 
inferred profiles were found closer to the true profiles when including the hyper-parameters’ 
uncertainty in the inference formulation. 
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1 Introduction 


Inverse problems arise in many applications whenever we seek to find some information 
about a physical system based on some observations. From a computational point of view, 
a major challenge of inverse problems is their ill-posedness where there is no guarantee 
that a solution exists, multiple solutions may exist, or even the solution does not depend 
continuously on the observations. This can be significantly affected by measurement errors, 
and inferring a suitable solution from noisy observations is an important and challenging 
topic. 

In this paper, we are only concerned with Bayesian approaches to inverse problems. 
This is motivated by their ability of providing complete posterior statistics and not just a 
single value for the quantity of interest. The multi-dimensional posterior can be directly 
explored via Markov Chain Monte Carlo (MCMC). This, however, requires repeated simula¬ 
tions (sometimes hundreds of thousands) of the forward model, once for every proposed set 
of parameters of the Markov chain [T]. This practice renders Bayesian methods computa¬ 
tionally prohibitive for large-scale applications. Acceleration techniques have been proposed 
in the literature in which a surrogate model is constructed that requires a much smaller 
ensemble of forward model runs which is then used in the sampling MCMC step instead at a 
significantly reduced computational cost. Marzouk et al. [ 2 ] for instance proposed a spectral 
projection method that uses spectral expansion of the prior model in Polynomial Chaos (PC) 
basis. The PC method has been extensively investigated in the literature, and its suitability 
for large-scale models has been demonstrated in various settings, including ocean [HI IH El E] , 
tsunami lai, climate modeling j9] and subsurface flow modeling ng. 

The PC method has been shown to be efficient for inverse problems involving a limited 
number of stochastic parameters; yet in some cases the unknown quantity is a spatial or 
temporal field in which the number of stochastic parameters is quite large. Computational 
challenges in this case arise in the surrogate model construction as PC suffers from the 
curse of dimensionality m- In addition, convergence is hard to achieve using the Bayesian 
inference due to the high dimensionality of the posterior. To overcome this numerical issue, 
Marzouk et al. na introduced truncated Karhunen-Loeve (KL) expansions to parametrize 
the stochastic field, endowed with a hierarchical Gaussian process prior. The idea is to 
transform the high-dimensional stochastic forward problem into a smaller problem whose 
solution captures that of the deterministic forward model over the support of the prior. 
Galerkin projection on a PG basis was used to seek the solution of the problem, and a 
reduced-dimensionality surrogate posterior density was constructed that is inexpensive to 
evaluate. 

The Gaussian process prior assumed in Marzouk et al. [E] is associated with hyper¬ 
parameters that are rarely known in practice. Assuming otherwise renders the quantification 
of prior uncertainty unrealistic and incomplete. Hierarchical Bayesian inference is proposed 
in the literature for calibration in presence of uncertain hyper-parameters but is done a 
priori ra The method proposed by Marzouk et al. na does not explicitly consider the 
effect of length-scales and only includes one hyper-parameter accounting for prior variance. 
An attempt to extend the method proposed by Marzouk et al. [I2] for priors with uncertain 
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hyper-parameters has been recently proposed by Tagade and Choi [H]. In their work, a 
methodology is introduced to obtain a KL expansion of a stochastic process in terms of 
functions of the hyper-parameters. The prior uncertainty in these hyper-parameters was 
expanded in a PC basis, and Galerkin projection was used to evaluate PC coefficients of the 
surrogate model. The hyper-parameters hence become part of the inference problem and are 
estimated from the observations. 

This paper proposes an extension of the method of Marzouk et al. [ 12 ] that is also an 
alternative to Tagade and Choi method H- Our proposed method explores the origin of the 
KL expansion where it is based on the eigen-functions and eigen-values of a given covariance 
function. These eigen-functions form a basis in a space dictated by the covariance hyper¬ 
parameters. Our method utilizes a change of basis methodology and therefore transforming 
the KL expansion based on one certain set of hyper-parameters into another. A fundamental 
distinction of the present work is that we avoid constructing a PC expansion for the uncertain 
hyper-parameters, and instead use the PC expansion constructed for a reference set of hyper¬ 
parameters and apply transformations to obtain PC expansion for any another set of hyper¬ 
parameters. The advantage of the proposed method is that the dimensionality of the PC 
expansion is not augmented by the number of hyper-parameters of the covariance function. 
Also, our method avoids cases when the hyper-parameters have complex distributions and 
PC bases may not even exist. 

To outline the proposed developments, we start in Section by providing a statistical 
formulation of the inverse problem based on Bayesian inference. Section then presents the 
KL expansion and its generalization to account for uncertain hyper-parameters by means 
of change of basis. Section describes the role of PC in Bayesian inference acceleration. 
In Section numerical results for the calibration of a one-dimensional toy problem are 
presented and Section concludes the paper with a summary of the results, discussion and 
conclusion. 

2 Bayesian Inference 

Bayesian inference is a statistical approach to inverse problems that has gained much 
interest in different applications including ocean nanniiii, climate HD and geophysical [I| 
modeling. We review the Bayesian approach briefly below and discuss its implementation to 
our problem. 

Our objective is to infer a deterministic field m{x), for some x E D, from a finite set 
oi No > 1 observations d G We consider situations where the observations d are 

not direct measurements of m{x), but are derived quantities that can be predicted using 
a model-problem (typically a set of partial differential equations), often called the forward 
model, relating the m(x) to the model predictions: m{x) ^ u{m) G . The Bayesian 
formula updates our prior knowledge of the m introducing an error model for the discrepancy 
between the model predictions u{m) and the observations d; the Bayes’ rule is expressed 
as [18] : 

p{m, al\d) oc p{d\m, al)p^{m)po{al), (1) 
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where p(d\m, a^) is the likelihood of the observations, given m and the error model hyper¬ 
parameter with prior Po(c’'o), and Pmijn) is the held’s prior. For simplicity, an unbiased 
additive Gaussian error model will be considered, 

e = d- u{m), e J\f{0,allNo), (2) 

where A^(0, denotes the centered multivariate Gaussian distribution with diagonal 

covariance In other words, the errors in the observations are assumed independent. 

For this choice, the likelihood becomes 

No 

p{d\m,al) = Ylpe{di - Pe{x,al) = 

i=l 

The main difficulties with the posterior above are the infinite dimensional character of 
m{x) and its prior definition. A discretization of m{x) is needed to perform the inference 
and setting a finite dimensional prior distribution. If m(x) is endowed with a Gaussian 
prior, it is fully characterized by its second-order properties, namely its mean fi(x) and 
covariance function C{x,x'). From p and C, one can rely on truncated Karhunen-Loeve 
(KL) decomposition to represent m{x) as a convergent series involving a finite set of KL 
coordinates (or expansion coefficients) pk, k = as discussed in Section]^ The 

inference problem can then be reformulated for the vector rj of coordinates %, leading to 

p{r], al\d) oc p{d\r], (jl)p^{'q)po{al), (4) 

where Priirf) = exp{—r]^r)/2)/{2Tr)^/^ is the Gaussian prior of the KL coordinates. 

As discussed below, the covariance function C{x,x') is generally selected on the basis 
of limited knowledge and the inference of m{x) can be improved by introducing additional 
hyper-parameters q in the definition of C i.e. C{x,x',q). This yields the generalized Bayes’ 
formula, 

Piv, Q, ot Pid\v, q, (Tl)Pni'n)Pq{q)Poi(^l), (5) 

where Pq is the prior distribution of the covariance parameters. For the case of covariance 
with hyper-parameters g, the likelihood takes the following general form, 

No 

p{d\rj, q, al) = Ylpeidi - Ui{rj, q), a^), ( 6 ) 

1=1 

with Pe defined in Eq. ([^, and q) being a short-hand notation for the model prediction 
Ui{m) with m(rf,q) the reconstructed field. 

Inferring the field then amounts to sampling the posterior of KL coordinates rj and hyper¬ 
parameters q. In general, the sample space is high-dimensional and suitable computational 
strategy is the Markov chain Monte Garlo (MGMG) method. In this work, we rely on an 
adaptive Metropolis-Hastings MGMG algorithm dniEO! to accurately and efficiently sample 
the posterior distribution p(rf,q,al\d). This requires the evaluation of the posterior (up 
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to its normalization constant) for multiple sample values of {r),q,a^). The computational 
flow-chart for an evaluation of the posterior is schematically illustrated in Figure Briefly, 
given a sample value of q, the dominant KL modes of C{x,x',q) are computed and the 
corresponding field m{x) is constructed using the sampled value of rj. This held is fed into 
the solver to compute the model predictions u{rf, q) which are used, together with the sample 
value of the model error parameter cr^, to successively compute the likelihood and hnally the 
posterior. 



Figure 1; Flow-chart for the evaluation of the posterior distribution in the inference problem. 

In general, the most computationally demanding part for sampling the posterior is the 
computation of the model predictions, given ( 77 , qr) (that is the realization of the held). 
This is particularly the case when the predictions involve the solution of partial differential 
equations. This computational cost motivates the substitution of 77 ( 77 , O) with a polynomial 
surrogate model u{ri,q), whose evaluation is inexpensive compared to the solution of the 
complete model. The surrogate is constructed offline and subsequently used on-line when 
running the MCMC algorithm. Specihcally, the likelihood of the observations is approxi¬ 
mated using 


No No 

p{d\T],q,al) = Ylpe{di-Ui{T],q),al) Ylp^di - Ui{^{'n,q)),al), (7) 

i=l i=l 

where, as mentioned previously, u{^) is a polynomial and ^ : ( 77 , q) ^( 77 , q) is an explicit 
change of coordinates. Construction of the surrogate model for the predictions is detailed 
in the next two sections; Section introduces the q-dependent coordinate transformation, 
while the polynomial approximation 7i(^) is discussed in Section]^ together with the resulting 
surrogate-based sampling scheme. 

3 Coordinate transformation for Uncertain Covariance Functions 

3.1 Karhunen-Loeve expansion 

Let D C d > 1, be a bounded domain, and denote X = L^{D) equipped with inner 
product (•, ■)x and norm || • \\x'- 

« G X llujlx < 00 , \\u\\‘j^ = {u,u)x = / \u{x)\'^dx. (8) 

J D 
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Consider a real-valued stochastic process M{x,0i)) with mean and continuous co- 
variance function C{x,x') on D x D] cu is a. random event belonging to a sample space Q of 
a probability space (O, E, P). The covariance function is defined as 

C{x, x') = E [{M{x, •) - ii{x)){M{x', •) - ii{x'))] , (9) 

where E denotes the expectation operator. The covariance function C is symmetric positive 
semi-definite and thus by Mercer’s theorem [2T] it has the following spectral decomposition: 

oo 

C{x, x') = Afc0fc(cc)0fc(a;'), (10) 

k=i 

where the Xk and 4>k{x) are the eigen-values and associated (normalized) eigen-functions of 
the linear operator corresponding to the covariance function C; they satisfy the Fredholm 
equation of the second kind: 


' D 


C{x,x )0fc(aj ^dx A/j0fc(aj), ||0/c||x 1- 


( 11 ) 


The eigen-values A^ are real and countable and the eigen-functions 0fc(a;) are continuous 
and constitute an orthonormal basis in L^(D). Ordering the eigen-values in a decreasing 
sequence Ai > A 2 > • • • > 0, the truncated Karhunen-Loeve (KL) expansion of M is 
given by [22] 

K 

M{x,lj) Mk^x.uj) = fi{x) + yy ^/\4>k{x)r}k{uj), (12) 

k=l 

where K is the number of expansion terms retained in the spectral approximation. The 
stochastic coefficients 

r]k{uj) = {M{x,uj) - iJ,{x),(l)k{x))x, (13) 


are mutually uncorrelated random variables with zero mean and unit variance, such that 
^iVkVk'] = Skk'- Under the assumption that M is a Gaussian Process (QV) denoted by 
M ~ QP (/U,C), the r^fc’s are Gaussian and also independent. The truncated KL expansion 
is optimal in the mean square sense, meaning that of all possible K-term expansions, the 
Mk in Eq. ( 12 ) with A^ and (j)k{x) satisfying Eq. ([TT| minimizes the mean-squared error in 
the approximation of M [22|- While it is known that the KL decomposition of M converges 
uniformly as K —?• 00 [2H], the truncation error has implicit dependence on the covariance 
function C. 

The KL expansion is often employed to reduce the dimensionality in inverse problems, 
considering the expansion coefficients 77 ^= 1 ,in Eq. (12) as reduced coordinates for the 

In the Bayesian framework, 


field m{x) to be inferred from the collected observations 
this amounts to the determination of the posterior distribution of the expansion coefficients 
vector 77 , which can be sampled or analyzed to estimate the characteristics of the field 
m{x) (in particular, retrieving the median, MAP value, confidence intervals,...). However, 
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the posterior and so the inferred field m have implicit dependencies on the assumed prior 
covariance structure. This point has motivated the introduction of parametrized covariance 
families, as discussed in the following section, where the covariance parameters are treated 
as hyper-parameters in the inference procedure. 


3.2 Covariance function with uncertain hyper-parameters 

From now on, we assume the prior of m{x) to be Gaussian and so completely characterized 
by its mean /r(a?) and covariance function C. However, in many applications, not all the 
aspects of the covariance function are well-known a priori. The stationarity of the covariance 
function can be easily determined and confirmed, yet, we have a large uncertainty in the other 
characteristics such as the values of the hyper-parameters. An example of a parametrized 
covariance function is 

C{x,x') = ajexp ~ x'Y'yV{x — a;') j -I- o-^x^x' + cr^ + (14) 


where M is a symmetric positive definite matrix. The covariance hyper-parameters M, 
cr|, (T^, cr^, cr^ are usually not exactly known a priori and should be treated as uncertain 
quantities. For many covariance functions it is easy to interpret the meaning of the hyper¬ 
parameters, which is of great importance when trying to understand the data. Traditionally, 
the hyper-parameters are estimated using Gaussian Process Regression (GPR) before infer¬ 
ring the model parameters [l3j. To this end, a set of possibly noisy observations of the field 
m{x) are used to perform stochastic interpolation of static data collected at few locations 
and maximize the marginal likelihood function using Bayesian inference or optimization 
techniques. Optimal values of the inferred hyper-parameters are then used in the covariance 
function and KL expansion is applied as described in Eq. (12). The uncertainty bound can 
be estimated using GPR but is usually not considered in the expansion due to the complexity 
of the resulting model. This paper addresses the uncertainty in the hyper-parameters of co- 
variance models. Specifically we develop a formulation that enables inferring the covariance 
function hyper-parameters along with the KL stochastic coordinates %. The formulation is 
based on basis transformations as described below. 


3.3 Stochastic coordinate transformation 

Without loss of generality, we assume that the stochastic prior process M is centered 
{pi{x) = 0) and has a parametrized covariance function C(a;, x', q) defined by a random vector 
q C of hyper-parameters {h is the number of hyper-parameters, e.g. q = {M, cr^, cr^, a^} 
for the example in Eq. 0). with joint density Pq. Because of the dependence of the 
ance function on q, the KL expansion of M in Eq. (12) becomes: 


covari- 


M, 


K 


^ _ r 

ix,uj,q) = y/Afc(q)0fc(a;,q)r]fc(n;), / C<yX,x',q)(j)k{x',q)dx = \k{q)(t>k{x,q). 

k=i 


(15) 
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To simplify the notation, we drop the x and x' dependence and introduce the scaled eigen¬ 
functions 


K 

^k{q) = \/^fc(g)0fc(g), so MK{uj,q) = (16) 

k=l 

We further assume the continuity of the scaled eigen-functions with respect to q, in 
the sense (see [2S]) ^Dk{q) > 0, \\^k{q) - ^kiq +Sq)\\x < and = 

D^^\q) < oo uniformly, so that 

E [\\MK{q) - Mxiq + ^9)11^ < DK{q)\\Sq\\\. (17) 

In practice, when decomposing a covariance function C{q), the normalized eigen-functions 
are defined up to a factor of ±1. To ensure the qf-continuity of the 0fc(q)’s, we have to 
select a consistent orientation of eigen-functions. A possibility, followed in this work, is to 
define the orientation of the eigen-functions with respect to a reference set of eigen-functions 
k = 1,..., A'}, e.g. the reference set defined below, such that {(t>k{q) has a constant 
sign for all q [26] . The dependence of the eigen-functions on the hyper-parameter q is further 
illustrated in Section l3^ below. 

Let be a covariance function representative of the qr-dependent covariance function 
C{q) of M. As investigated below, a possible choice for can be 


r = C = / C{<,)p,(g)dq, 


(18) 


that is the qf-averaged ofC{q), or a particular realization of C corresponding to a deterministic 
value q^ of the random parameters {e.g. nominal values obtained using GPR na). We denote 
(jff^ the ordered and normalized eigen-vectors of C’'. Note that {0^, A: = 1, 2,... , oo} is an 
orthonormal basis of X; as a result, any scaled eigen-function ^k{q) can be expressed in this 
basis: 


hkiqWf, Kk'{q) = (<H, ti'(q))jt ■ 


(19) 


k' = l 


The continuity of the scaled eigen-functions implies the continuity of the projection coeffi¬ 


cients bkk'{q)- For computational purposes, the expansion in Eq. (19) needs to be truncated 
to the first terms. Without loss of generality we shall use in the following 
allowing for convergence analysis with respect to a single parameter K. 

Further, the change of basis gives: 


K 


K 


K 


K 


MK{uJ,q) = J^d>fc(qr)%(n;) '^bkk'{q)4>l' Vk{(^) = ^ (f>lflk{oJ, q), (20) 


k=l 


k=l \k'=l 


k=l 




where we have denoted 


K 




( 21 ) 


k'=l 


The transformation shows that the qr-dependence of C can be translated into an expansion 
Mk with q dependent scaled eigen-functions, see Eq. (16), or approximated by a g-dependent 
linear transformation of the random variables in Eq. (20). Specifically, denoting the latter 
approximation we have the approximations 


K 


K 


M{u,q) ss MK{ut,q) = '^^t{q)flk{i^) « MK{i^,q) = 


( 22 ) 


k=l 


k=l 


with fjk related to the ^’s by Eq. (21). 

We observe that q) is a linear combination of standard Gaussian random variables, 
so it is also Gaussian (with zero mean). However, the fjkioj, q) are generally correlated. The 
change of random coordinates in Eq.(21) can be cast in matrix form: 

f}{u},q) = 13{q)rj{u). (23) 

The covariance matrix for the random coefficients r), denoted S^(q), can be expressed as 

E^q) = E [viqWiq)] = B{q)B\q). (24) 

We shall assume that E^(q) is invertible (for almost every q); a sufficient condition is that 
^i<k<K{q) is not orthogonal to span{(/)((,..., 0^}. In addition, the conditional distribution 
of f), given g, is 


PtjivW) = 


k/2PW{q)\ 

where |E^(q)| is the determinant of E^(q). 


exp 




(25) 


3-4 Example 

We now provide a brief illustration of the convergence of the error in the approximation 
of M{q). To this end, we consider D = [0,1] and a centered Gaussian process M with 
covariance function 

C{x, x', q) = a) exp (^- ^ , (26) 

with hyper-parameter vector q = {a'jj}. In this case, only the correlation length I affects 
the shape of the eigen-functions, while the process variance aj simply scales the eigen-values. 
Therefore, we fix (jj = 0.5 through-out the section and assume uncertainty in I only, that is 
q = {/}. Specifically, we assume the hyper-parameter I to have a uniform distribution in the 
range [0.1,1]. It is important to note that the number of KL modes needed for convergence 
highly depends on the hyper-parameter 1. In particular, if M has small-scale features (small 
1) a large number of KL modes will be needed. 
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For the selection of the reference covariance function, we contrast the choice = C{r), 
for several values T G [0.1,1], with the case C =C. The KL decompositions are numerically 
approximated with Galerkin piecewise constant modes over a uniform grid having = 128 
elements in space. Figure compares in the left plot the considered reference covariance 
functions O' and in the right plot the respective decay rates with k of their eigen-values 
A^. When using C(r), it is seen that the smaller F the slowest the decay rate, as expected, 
whereas for the qr-averaged covariance C the decay rate is asymptotically similar (but with 
a lower magnitude) to the lowest F in the uncertainty range. Also, note that C is evidently 
not Gaussian. 




Figure 2: (Left) Reference covariance functions C’’ = C{F) for different values of F, as indicated. Also 
plotted is the g-averaged covariance C . (Right) Spectra of the corresponding eigen-values decay with K. 


To quantify the error in the approximation of M{u, q) by the proposed transformation 
method, we introduce the following relative error measure 


eMiK,q) 


M{q) - M^iq) 




\\M{q)\\ 




(27) 


where 

l|C'llh„,„,=E|(UC)A-|. ( 28 ) 

The error eM{K, q) integrates both the truncation error in approximating M with Mk, and 
the subsequent projection error of Mk into the space of reference modes <j)\. The error cm 
is estimated by means of Monte Carlo sampling where realizations of M are generated given 
q-, these realizations are projected on the W-dimensional dominant space of C{q) in order to 
compute the coordinates rjk (see Eq. (|I^) which are transformed using Eq. ( |^ to obtain 
the corresponding realizations of Mk- Observe also that \\M{q)\\^ 2 ^^ = (Jf- Einally, the 

local (squared) error q) can be averaged over q to yield the averaged error, which we 

denote Em{K). 

The mean square error Em{K) is shown in the left plot of Eigure|^ Plotted are curves 
for different reference bases: using C’’ = CiV) with selected correlation lengths E within 
[0.1,1.0], and the q-averaged covariance function C. A first comment from these curves is 
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that the error decreases as K increases as expected. However, for r > 0.1, the error Em{K) 
stagnates as K increases when using = C{V). The stagnation occurs at lower K when 
r increases. This stagnation can be explained from the spectra reported in Figure which 
shows that when using F > 0.1 the magnitude of quickly decays with k to zero machine 
precision, such that subsequent modes are not correctly estimated and cannot provide a 
suitable projection basis. To further illustrate the effect of finite numerical accuracy, we 
provide in Figure plots of eigen-functions (j)k{x,l) for selected k and {x,l) G D x [0.1,1]. 
It is seen that for A: = 1, 4 and 7, the dependence on I of the numerical eigen-functions is 
smooth. In contrast, for k = 10 (resp. 13 and 19) the computed eigen-functions are seen to 
be noisy for I > 0.9 (resp. I > 0.5 and 0.25) because of finite numerical accuracy. Clearly, 
this indicates that under-resolved modes could be disregarded and that the reference basis 
should include only modes with indices k such that A^/A][ remains in achievable accuracy 
(?a 10“^® for double precision). To keep the analysis simple, and because our approach is in 
fact robust to under-resolved modes, we continue in the following to compare for the same K 
the different choices of reference covariance functions. Note also that for the reference basis 
using the shortest correlation length, /’' = 0.1, and the qr-averaged covariance, this numerical 
issue has not yet emerged for the range of considered K, and the corresponding errors decay 
monotonically up to iF = 25. In addition, it is seen that the error curve corresponding to 
the q-averaged reference covariance function C has the lowest approximation error Em{K) 
for all K. 

This is not a surprise since by construction this choice uses eigen-functions spanning 
the optimal subspace to represent M(u),q) when q varies with law Pq{q). In fact, finding 
the iF-term expansion minimizing the q-averaged mean square error approximation of M ~ 
QV{0,C{q)) amounts precisely to the decomposition of C. In other words, if using the K 
dominant eigen-modes of = C to construct the reference basis is non optimal to represent 
M{q) for any value q (obviously for each q the optimal choice is the eigen-modes of C{q)), 
there is no better choice on average over q. 




Figure 3: (Left) Error Em{K) in approximating the Gaussian Process M by Mk for different reference 
covariance functions based on selected correlation lengths V as indicated. Also plotted are results obtained 
with C. (Right) Relative error cm (IF = 15, 1) for the same cases as in the left plot. 

To better appreciate the behavior of the error with the hyper-parameter I in the present 


11 













example, the right plot in Figure reports the evolution of eM{K,l) for K = 15, using the 
same reference covariance functions considered previously. It is seen that for all reference 
covariance functions, the error ejvf(iF = 15,/) increases when I decreases, reflecting the in¬ 
creasing truncation error for K = lb when M involves smaller features. However, different 
behaviors are reported depending on the choice of C’' when I increases. When using = C{r) 
with r > 0.3, the error converges to machine precision when / > /’’, meaning that in this 
situation the 15-dimensional reference subspace span{0^ = (/’’), k = 1,..., K} essentially 

encompasses the 15-dimensional dominant subspace of C(/ > /’’). Further, this behavior 
highlights the robustness of the change of coordinates, even for situations where finite nu¬ 
merical accuracy prevents the correct determination of the whole set of eigen-functions. On 
the contrary, the choice C(/^) with r < 0.2, while yielding a lower error at small correlation 
length I I'', exhibits a stagnating error for I ^ ^ denoting that the corresponding iF = 15- 

dimensional reference subspace is not rich enough to encompass the dominant subspaces at 
larger correlation lengths. Roughly speaking, the reference eigen-functions are too oscillating 
to properly represent processes with long-range correlations. Finally, the selection of C for 
the reference covariance function provides the best compromise, by construction, maintaining 
a maximum error euiK = 15, /) less than 10“^ over the whole range of /. 



Figure 4: Dependence of eigen-functions <pk{<l) with the length-scale hyper-parameter I and selected k as 
indicated. 


4 Polynomial Chaos Surrogate 

A suitable Polynomial Chaos (PC) expansion for the model predictions is constructed to 
accelerate the Bayesian inference process. In Section [44] we briefly review the PC methodol¬ 
ogy and provide some details regarding the numerical methods used in the examples provided 
in Section Then, in Section |4.2| we focus on exploiting the PC surrogates to efficiently 
handle uncertain hyper-parameter through the change of coordinates introduced previously 
Finally, Section |4^ provides a brief analysis of the PC surrogate error. 


in Section 3.3 
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4-1 Polynomial Chaos expansion 

Polynomial Chaos (PC) is a probabilistic methodology that expresses the dependencies of 
a model solution on some uncertain model inputs, through a truncated spectral polynomial 
expansion mm- Let U £Y he solution of a mathematical model C {e.g. Partial Differential 
Equations), formally expressed as CU = 0. We are interested in situations where the model C 
is uncertain and parametrized with a finite set of independent second-order random variables 
^ = (Cl, •••; Cm) with known probability distribution. For simplicity, we shall restrict ourselves 
to the case of i.i.d. standard Gaussian random variables Ci, and will denote the density 
function of and L 2 {p^) the space of second order random functionals in that is 

n(C) e L 2 {p^) ^ j -j\v{^)\^p^{^)d^ < oo. (29) 

Since the model depends on its solution is also generally dependent on ^ and satisfies 

mUi$) = 0, a.s. (30) 

Let G N} be a complete orthonormal set of L 2 {p^), such that the model solution has 

an expansion of the form 

(7«) = = (31) 

aSN 

where the equality stands in the mean square sense and the expansion coefficients Ua & Y 
are called the stochastic modes of U. A classical choice for the random functionals are 
orthonormal multi-variate polynomials in leading to the so-called PC expansion of U{^). 
The Ci being standard Gaussian random variables, the are in fact normalized multi¬ 
variate Hermite polynomials For practical purposes, the PC expansion of G(^) needs 
to be truncated. When the basis is truncated to total order o the total number of terms in 
the PC expansion is given by P -|- 1 = (A” -|- o)!/(A!o!) and therefore increases exponentially 
fast with both the expansion order o and the number N of random variables Ci- The series 
expansion approximating P(^) is then finite and will be denoted P(^) in the following: 

p 

(32) 

The existence and convergence of this series is asserted by the Cameron-Martin theorem [28] 
with the condition of U having a finite variance. The rate of convergence, and hence the 
number of terms in the series, depends on the smoothness of U with respect to The series 
converges spectrally fast with P when U is infinitely smooth. 

Various methods have been proposed for the determination of the PC coefficients Ua- 
They can be distinguished into the Non-intrusive and Galerkin methods. Non-intrusive 
methods rely on an ensemble of deterministic model evaluations of P(^), for particular real- 
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izations of ^ selected either at random or deterministically. Non-Intrusive methods include 
Non-Intrusive Spectral and Pseudo-Spectral Projection [29l [301 El], Least-Square-Fit and 
regularized variants [321 ESI ED, Collocation (interpolation) methods [35l [36l |37| , that are 
often combined with Sparse-Grid algorithms to reduce computational complexity. 

In the present paper, we instead rely on the Galerkin projection method (231111 for which 
the expansion coefficients Ua are defined through a reformulation of the model Eq. (j^ , using 
a weak form at the stochastic level. Specifically, Eq. (30) is projected on the PC basis, a 
procedure resulting in a set of P -|- 1 coupled problems. 


£(«) =0, I3 = 0....,P. 


(33) 


q ;=0 


Numerical algorithms have been proposed to efficiently solve this set of coupled problems, 
both in the case of linear operators P(^) (see e.g. |HH| for elliptic and parabolic problems) 
and non-linear operators (see e.g. [HHl SO] and references in m)- 


4-2 PC surrogate for a parametrized eovarianee 

Returning to the inference problem, we now want to construct a global PC surrogate 
for the model predictions, that accounts both for randomness of Mk, through its random 
coordinates rj, and the uncertainty in its covariance function, through the random hyper¬ 
parameter vector q. We assume that the model problem amounts to solving for U a model 
depending on M^- Using the notations above, it is written formally as 

^'n,Q)U{'n,q) = 0. (34) 


The previous equation has motivated the idea of expanding the dependence of U with respect 
to the random vectors rj and qr on a PC basis [121 El, that is using U{rf,q) E„CA(r7,g). 
In the following, we consider an alternative approach, taking advantage of the change of 
coordinates discussed in Section [^ The change of coordinates allows us to approximate 
Mxiq) on the fixed reference basis of KL modes {(pl,k = 0, ...j/L}, through the linear 
mapping rj i-)- f]{q) = B{q)r]. Eq. ( |25j ) provides the density of r) conditioned on q. The 
model problem can therefore be recast as 


£( 77 ) 1 /( 77 ) = 0 , where 77 ~ p^( 77 |qf). 


(35) 


The last expression shows that we only need to construct an approximation of the mapping 
77 I-7- 1 /( 77 ) which is accurate enough with respect to the conditional density Pj)(77|q) when q 
varies. To get rid of the qr-dependence of the conditional density, we can consider averaging 
Pf) over q. In the case of the reference covariance function = C, it can be shown that 


Pfi{'n\q)pg{q)dq = 


v/27r^|A2 


exp 


y(A") 


-1 


V 


(36) 
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where Is? = diag (A^,..., A^). In other words, the qr-marginal of the conditional density 
yields independent Gaussian random variables. This suggests constructing an approximate 
mapping of f] ^ U, solving the model problem for a reference Gaussian held dehned as 

K 

Mjpii) = E (37) 

k=l 


where the ^^’s are independent standard Gaussian random variables. It corresponds to a 
reference model problem C{^) based on the reference Gaussian process As before, 

we denote G(^) the PG approximation of the reference model problem £(^)[/(^) = 0. From 
this PC approximation, we can approximate the model problem solution for couples (77, q) 
through 

p 

U{r],q) = ^{q)v, (38) 

a=0 


where 

Based 


the q-dependent matrix B{q) expresses the change of coordinates {ri,q) e- )• ^(77,9) 
on Eq. (37), we propose to use 


Bki{q) 


Bki{q) 

0 , 


KIK > 

otherwise. 


(39) 


where k > 0 is a small constant related to the numerical accuracy (typically k ~ 10“^^) intro¬ 
duced to avoid ill-dehnition of the ^^’s associated to negligibly small A^. The K-thresholding 
leads to transformed coordinates ^(77, q) where the hrst K^^{k) components are non trivial, 
with K^^{k) < K. Note that the number of non-trivial components of ^ only depends on 
the reference covariance, C’’, and not on q. Also, the PC construction of U{$,) can in fact be 
reduced, considering instead of with computational complexity reduction as 

a result when < K. However, we shall continue to report results as a function of K for 
simplicity. 

When the reference covariance is not the q-averaged of C{q), the q-marginal condi¬ 
tional density pfj remains Gaussian but introduces correlations between components. These 
correlations can be dealt with by introducing an additional change of basis in order to rede¬ 
fine a reference Gaussian process in terms of independent standard Gaussian random 
variables ^^’s. In that case, Eq. (39) must be accordingly modified to account for the ad¬ 
ditional change of coordinates. Alternatively, when using = C(q^), we can continue to 
define the reference Gaussian process by Eq. (37), which corresponds to solving the un¬ 
certain model problem assuming that 77 has for density conditioned on q = q’’. Although 
simpler, the approach is expected to yield a higher approximation error on average (over q), 
as explained below. 
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4-3 Example 

We consider the following model-problem consisting in the ID transient diffusion equa¬ 
tion, 

dU d f dU\ 


where the diffusivity is a stochastic field. Eq. (40) is solved for t E [0, T], in the unit domain 
D = [0,1], and with deterministic boundary conditions U{x = 0,t) = —1, U{x = l,t) = 1, 
and homogeneous initial condition U{x,t = 0) = 0. We consider a log-normal stochastic 
diffusivity field of the form, 

z/= z/Q + exp(M), (41) 

M is a (centered) Gaussian process with uncertain covariance function C{q). With uq > 0 
the diffusivity is bounded away from 0 which ensures the well-posedness of the problem. In 
the computations we set z^o = 0.1. In addition, we re-use the settings of Section [3^ with 
Gaussian covariance function having an uncertain length-scale I with uniform distribution in 


[0.1,1] and fixed variance aj = 0.5. For the solution of Eq. (40) we use a classical Pl-finite 
element method (continuous piecewise linear approximation) for the spatial discretization, 
with a second order implicit time-integration scheme. 

To investigate the error introduced by approximating M ^ U hj the PC map Mk U, 
we define the following error measures on the model problem solution. We first define the 
relative local error eu{o,K,q) as 


4(0. K, q) 


mM(g))-Lr(i(;q))ll 


(42) 


where 


\\V\\Un,Y)=^ 




(43) 


uo j 

This error measure incorporates the effects of several approximations: the approximation 
of M{q) on the W-dimensional reference subspace, the truncation of the PC expansion 
to finite order o, and the spatial and time discretization errors inherent in the numerical 
resolution of the model problem. Because the PC surrogate will be used in place of solving 
numerically the model problem (given rj and q ), we should not be concerned with the spatial 
and time discretization errors, and rather use for U{M{q)) its discrete counterpart, provided 
that the same spatial and time discretizations are used. For the tests presented in this 
section, we use a uniform mesh with 56 elements and a fixed time-step At = 10“^. These 
discretization parameters were selected to ensure that the error measurements reported below 
are dominated by the K and o-order truncation effects. Doing so, the local error efj can be 
estimated by means of Monte Carlo average proceeding as follows. For a sample of q, a) 
we generate a sample of the Gaussian process M{q) on the finite-element mesh and solve 
the corresponding deterministic diffusion problem for the sample oi U{M(q)); b) we project 
M(q) on the KL subspace, to obtain the KL coordinates rj which are further translated 
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to ${r],q), and the PC approximation U{$) is evaluated (see Eqs. 38 39); c) we compute 
\\U{M{q)) — t^(^)||i 2 (£)xr) sample. Further, we set T = 0.05 in order to focus the 

error measure in the transient period. 

Similarly, the local error can be q-averaged to yield the relative global error counterpart: 


EUo,K) 


H \mmQ)) - mH; q))\\l,ia,Y,PMdq 

J -f \\U{M(q))\\l^jayiP,(q)tki 


(44) 


Figure ^reports the global error for the present test problem. The left plot depicts Eu{o, K) 
as a function of K and for a PC order o = 10. Errors are shown for same selection of 
reference covariance functions C’’ used in Section 13.41 We observe that for all the selected 
reference covariance functions, the global error on U stagnates for W >9. This indicates 
that the PC truncation becomes the dominant source of error for K > 9. It is also seen that 
for all K shown, the error is the lowest when using C for reference covariance, as expected. 

Further, when using C{q^) as reference covariance function, the dependence of the global 
error on the reference length-scale F is non monotonic, but presents a minimum around 
r = 0.4. This minimum can be explained by the competition of two effects. On the one hand, 
we have seen that increasing causes an increase in the approximation error of M, which 
translates in a larger approximation error on U. On the other hand, it can be shown that 
the lower the more the q-marginal of departs from the variates standard Gaussian 
distribution assumed for the construction of U, with increasing averaged approximation error 
on ?7 as a result. The right plot of Figurealso depicts the global error, but now for a fixed 
number of KL modes, K = 15, and increasing PC order o G [2,10]. Again, curves are shown 
for the different reference covariance functions. Similarly to the previous results, the global 
error is seen to stagnate for o ^ 8, indicating here that for larger o the KL truncation error 
is dominant. In addition, for all shown o, using C for reference covariance function appears 
to be superior to the choices C{F), while the later choice again exhibits a non-monotonic 
dependence of the error with respect to V. 




Figure 5: Global error Eu{o, K) of the PC approximation JJ of the diffusion model problem solution. (Left) 
The plot shows the dependence of the error with K using a PC order o = 10. (Right) The plot is for different 
o and K = 15. The curves correspond to different definitions of the reference covariance function C'’ = C{V) 
with F as indicated or the g-averaged covariance function C. 
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Figure 1^ presents in the left plot the normalized local error eu{o,K,l) for the case of 
K = 15 and PC order o = 10. As mentioned previously, the local error combines the effects 
of approximating M by Mk-, which has been reported in the right plot of Figure]^ and the 
PC truncation error. Focusing first on the cases where C{q^) is used as reference covariance, 
we observe a more complex behavior of the local error with I, depending on the selected 
reference length-scale A. Specifically, the local error at some I is always the lowest for the 
reference length-scale r the closest to 1. This is expected, as using A is the optimal choice, 
given K and o, to achieve the lower error at I = V. For C’’ = C, which ensures by construction 
the best compromise over the q-range, the local error remains below 2% over the whole range 
of hyper-parameters. Further, using A > 0.2, the local error first monotonically decreases 
with I and then stagnates (except for = 1 where stagnation is not achieved). 

Contrary to the local approximation error on M, the stagnation with I 1 occurs at an 
error level that strongly depends on l^. This seems surprising as we have seen (right plot of 
Figure that for /^ > 0.2 the approximation error on the process goes to zero as / —>■ 1, such 
that we could have expected an essentially constant local error eu iov 0.2 < < I, depending 

only on the PC expansion order o. But one has to take into account the mapping from rj 
to ^ to understand the behavior of the local error. Specifically, the PC approximation is 
constructed to minimize the approximation error for the reference model problem based on 
MPC( 

or K^^{k,)) in Eq. ( p7| ), where the ^^’s are independent standard random variables. 
Therefore, the PC approximation aims at minimizing the error with respect to the standard 
iF-variates Gaussian measure. When querying the PC approximation for some specific hyper¬ 
parameters value q ^ follows a conditional Gaussian distribution p^{$\q), induced by 
the transformation ^ = B{q)rj. In general, this conditional distribution differs from the 
standard Gaussian one, affecting the quality of the approximation depending on q. To get 
better insight into this effect, we remark that the conditional density p^(^|q) is centered and 
Gaussian with covariance structure S|(q) = B*{q)B{q). 

To measure the departure from the standard Gaussian multi-variates case, we present 
in the right plot of Figure the largest eigen-value /5max(^) of S|(q), as a function of 
q = {1} and for the different reference covariance functions. C^max measures of highest 
stretching rate induced by B. For the results reported in Figure we used a thresholding 
parameter k = 10“^^ in the definition of B{q). It is seen that when using for reference, 
\//3max(0 increases exponentially fast with V — I > 0, denoting a more and more stretched 
distribution for ^{ri,l), along some direction, as I decreases. Interestingly, although the 
maximal stretching rate can reach values as high as 10®, its impact on the PC approximation 
error is clearly much less important. The reason for the moderate sensitivity to coordinates 
stretching of the PC approximation error is that most of the stretching occurs along the 
directions associated with the lowest eigen-values A^, which have low to insignificant impacts 
on the model problem solution. In fact, our numerical experiments have demonstrated 
that the PC approximation error is essentially insensitive to k, provided it is small enough. 
Indeed, a fast (exponential) decay of the successive KL modes’ contributions to U is expected 
for elliptic and parabolic model problems, as the effects of short-scale fluctuations in the 
diffusivity field are filtered-out. However, coordinates stretching may yield robustness issues 
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for other model types. Finally, it is seen that choosing T equal to the minimal length-scale 
(/ = 0.1) yields a maximum stretching \//3max < 3 which is controlled over the whole range 
of I, while the case of C yields a signihcant stretching (picking to ~ 10) around the minimal 
length-scale, but quickly decays with I and remains close to 1 (see left plot of Figure [^. These 
findings confirm the appropriateness of the reference covariance function for the construction 
of the PC surrogate. 




Figure 6: (Left) Local approximation error €u{o,K,l), for o = 10 and K = 15. (Right) (log-scale) and 
inset (linear-scale); dependence on I of the maximal stretching rate \//3max(0 induced by the coordinate 
transformation B(q). The curves correspond to different definitions of the reference covariance function C’’ 
= C{F) with F as indicated or the q-averaged covariance function C. 

Concerning the PC approximation of the model-problem solution, we would like to stress 
the following points. First the approach can be readily extended to alternative and more 
elaborated PC constructions methods, including non-intrusive ones; in particular considering 
adaptive techniques where the set of polynomials used in the PC expansion is determined 
as to minimize the approximation error, instead of proceeding from PC basis with uni¬ 
form truncation order o, would clearly be beneficial, especially for problems involving high 
numbers of KL modes K and requiring high polynomial order along certain ^^’s and not 
others. Second, the numerical tests have focused on length-scale uncertainty only, which is 
indeed the hardest source of uncertainty as it affects both the magnitude and shape of the 
KL modes. In contrast, uncertainty in the process variance aj in the Gaussian covariance 
family only manifests itself in the magnitude of the eigen-values. Therefore, uncertainty in 
the pre-exponential factor aj of the Gaussian covariance can be handled through either an 
additional dimension to the PC expansion, as performed in [laiii!, or directly through our 
proposed change of coordinates approach based on the reference C, which amounts to take 
the averaged variance as the reference one. Similar to the problem for uncertain length-scale 
I, numerical tests (not shown) have demonstrated that the q-averaged definition of C leads 
to globally lower errors in presence of variance uncertainty, compared to a definition of the 
reference based on some q^. This has motivated the use of the q-averaged definition of the 
reference covariance function in the remainder of the paper. 

Finally, the PC expansion of the full model-problem solution has been considered here; 
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there may be other situations were expansion of the full model-problem solution is not 
necessary. For instance, if the nature of the observations are known prior to constructing 
the PC expansion, the direct expansion of the model predictions w(^) could be considered. 
If in addition the measurements have been performed, considering the direct PC expansion 
of the measurements to model-predictions discrepancy, Aci{$) — \di — (in the 

case of identically distributed additive noise) could be advantageous. 


5 Application examples 


In this section, we illustrate the benefit of considering prior Gaussian fields with parametrized 
covariance function in the inference of the diffusivity field in the transient diffusion problem 
introduced in Section |4.3[ We first present in Section [53 the inference problem, and intro¬ 
duce 3 cases that will serve to investigate the proposed method. We also provide details 
on the exploitation the PC surrogate constructed in Section |4| and on the PC accelerated 
formulation of the inference problem. For comparison purposes, we first solve in Section [5^ 
the Bayesian inference problem for a fixed covariance prior, that is without inferring the 
covariance hyper-parameters, using instead preassigned values. Then, in Section |5.3[ we 
considered the inference with hyper-parameters covariance and illustrate its advantage and 
behavior with respect to noise level, number of observations and surrogate polynomial order. 


5 .1 Set-up of the inference problem 

The proposed method will be illustrated for the inference of a log-diffusivity field, using 
the transient diffusion model problem corresponding to Eq. (40). To test the proposed 
method we consider three different log-diffusivity fields m{x) = log(z^ — Pq), to be inferred: 

• Sinusoidal profile: = sin(27rx). 


• Step function: 


f-1/2, a: <0.5 
[ 1 / 2 , a: > 0 . 5 ’ 


• Random profile: m^^'^{x) drawn at random from QV{0,C) where C is the Gaussian 
covariance with length-scale I = 0.25 and variance aj = 0.65. 

The inferences are performed on sets of data, {dj, i — 1,..., W}, consisting of noisy mea¬ 
surements of the solution to the diffusion equation for the three profiles. The measurements 
are taken at a set of spatial locations Xi uniformly distributed inside D = (0,1), and 
for Ut times tj uniformly distributed in (0,r). The total number of observations is then 
No = x nt. The observations are synthetically generated by perturbing the respective 
model solutions for the 3 fields tested with a measurement noise Cj randomly and inde¬ 
pendently drawn from the Gaussian distribution To avoid the so-called inverse 

crime mi, the solutions used to generate the observations are computed with a significantly 
finer spatial and temporal discretization than for the construction of the PC approximation. 
Unless otherwise specified, we use = 19, = 13 (so W = 247), with T = 0.05 and a 
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Figure 7: Illustration of inference problem for m®‘“. Plotted are the Ux = 19 observation points and the 
solution of the diffusion equation with profile at the n* observation times. 


Gaussian noise with = 0.01. Figure depicts the location of the observation points and 
the solution of the diffusion equation for at the different observation times. 

For the inference, we consider in all cases the Gaussian prior M ~ QV{Q,C{q)), where 
the covariance function C{q) has hyper-parameter q = The prior is then fully 

characterized once we have selected the prior of the hyper-parameters. We choose a uniform 
prior for I over the range [/min, ^max], with as previously /min = 0.1 and /max = 1, and an 
inverse Gamma prior na Hal HU for cTj with parameters a = 3 and (3=1. The prior 
of cr| thus has a long-tailed distribution with mean value l3/{a — 1) = 0.5 and variance 
/5^/(« — 1 )^(q; — 2) = 0.25. Note that the existence of the first moment of cr^ is enough to 
ensure the existence of the average covariance function, and that M(rf,q) G L2{D,pjj,Pq) 

(because the modes in its KL decomposition scales with y^)- In contrast, expanding the 
diffusion equation solution with respect to both the KL coordinates rj and hyper-parameter 
q (as proposed in [ElIIl!) could be problematic since, to our knowledge, there is no standard 
orthogonal polynomial family for the inverse Gamma distribution function and the solution 
U may not have second moment with y ~ Invr(3, 1) has unbounded second 

moment). Using the notation of Section]^ the prior of q is then 


1 


(aj) ^exp 


Pq(Q) =Pq(i,crf) = 


I/min /e 

. 0 , 


:|r(3) 


g .2 ) ’ ^ ^ [^min? ^max] 5 ^ ^ 


(45) 


otherwise. 


As for the noise hyper-parameter, we use the uninformative, improper, Jeffrey’s prior 

Po(<^o) « (46) 

Having specified all priors, the determination of the Bayesian posterior p(t], q, a^\d) re¬ 
quires the evaluation of the likelihood of the data d given (T],q,a^). Instead of following 
the computational flow-chart presented in Figure which would require the solution of a 
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deterministic model problem for each new sample of {r),q), we rely on coordinate transfor¬ 
mation and PC approximation as introduced in the previous sections. Following the findings 
of the previous section, the reference model problem is based on the stochastic process 
corresponding to the qf-averaged covariance function C, whose KL decomposition is trun¬ 
cated to the first K = lb dominant modes. Solving this reference problem, we obtain the 
approximation U{$) = '^a=o of fhe reference model problem solution U{$). Unless 

stated otherwise we use in the following results a PC order o = 10 with a spatial discretiza¬ 
tion involving 56 finite elements. From the approximate solution U, we can extract the PC 
approximations of the model predictions, w(^), whose components are 


p 

Uii$) = = i=l,...,No. (47) 

Q :=0 

This constitutes the offline step of the proposed PC-accelerated sampler. Once the PC 
approximation has been determined, one can use as a surrogate of the model 

predictions u(rf,q) in the (online) computation of the likelihood: 


No 


p{d\r], q, al) ^ p{d\ri, q, al) = JJpe(c?i - fii(€(»?, 


(48) 


i=l 


where ^{ri,q) is given by Eq. (38) and is defined in Eq. (§. Eor the actual definition of 
the coordinate transformation B{q) in Eq. (39), we set k = 0 since remains large 

enough for the present settings. Finally, multiplying by the hyper-parameter priors and prior 
of 77, one obtains (up to a constant normalization factor) the approximation p(rf,q,al\d) 
of the posterior distribution. The computational structure for the change of coordinates 
method and PC acceleration is schematically illustrated in Figure distinguishing between 
offline and online steps. The online step is imbedded in an adaptive Metropolis-Hasting 
algorithm to generate samples of (T],q,al) following the posterior density. 


5.2 Inference with fixed covariance parameters 

For comparison purposes, the Bayesian inference problems are first solved for the case 
of Gaussian prior covariance function having pre-assigned parameters / = 0.5 and a'j = 0.5. 
The problem therefore consists in inferring only the 15 coordinates rj and the noise hyper¬ 
parameter cr^. Note that in this case B is the identity, as the PC approximation is based on 
the prior process with pre-assigned covariance function. 

A total number of 2.5 x 10^ MCMC steps were deemed necessary for the pre-assigned 
hyper-parameters case to properly explore the posterior. The resulting chain of the KL co¬ 
ordinates were observed to be well-mixed (not shown). The marginal posteriors, estimated 
using a standard Kernel Density Estimation (KDE) method [ISIIIS], of the first 8 KL coor¬ 
dinates for the inference of m®'" are shown in Figure These posteriors are compared with 
their respective priors (standard Gaussian distributions). We notice that only the first 4 
coordinates rjk show significant improvement in their posterior distributions. This improve¬ 
ment can be quantified using the Kullback-Leibler Divergence (KLD) which is a statistical 
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Figure 8: Offline step (surrogate construction) of the accelerated MCMC sampler and Online step of the PC 
surrogate based evaluation of the posterior. 


measure that quantifies the distance between two probability distributions p and q m, 
defined according to: 


KLD(p,q) 


, 


(49) 


Here we calculate the KLD between the prior and the (marginal) posterior of each KL 
coefficient rjk- The KLD is indicated on top of each plot and quantifies the information gain 
from the observations, which is found significant only for the first 4 KL coordinates. Figure]^ 
also shows the posterior of the noise variance hyper-parameter (bottom right plot), cr^, which 
exhibits a Maximum A Posteriori (MAP) value close to the value used to generate the data, 
(jg = 0.01. The similar findings are reported for the inferences of and (results not 
shown for brevity). 

for the 3 


To better analyze the quality of the inferred fields, we report in Figure 10 


test cases, the median, 5% and 95% quantiles values of the posteriors of the inferred field 
m{x). These statistical characterizations of m are also compared with the true profiles. 
For the inference of m®'" we notice that the 5% to 95% quantiles range does not contain 
the true profile for a large set of x. This mismatch can be attributed to the pre-assigned 
hyper-parameter values that are not suitable. The same observation can be made for the 
case of In contrast, rn®*®^ is nearly everywhere within the 5%-95% quantiles range of 
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KLD = 39.6066 KLD = 63.1121 




KLD = 64.5038 




KLD = 0.011068 



KLD = 40.7446 





Figure 9: Comparison of the priors and (marginal) posteriors of the first 8 KL coordinates rjk and noise 
hyper-parameter (T^ (posterior only) for the inference of m.®'" without using covariance hyper-parameters (a 
Gaussian covariance with I = 0.5 and aj = 0.5 is assumed). The corresponding Kullback-Leibler Divergences 
(KLD) for the KL coordinates are also indicated on top of each plot. 


the inferred profile m. 

5.3 Inference with covariance hyper-parameters 

Next, we repeat the previous inference problems but considering now the covariance 
hyper-parameters I and cr^ in addition to the 15 KL modes and observation noise cr^. For 
the sampling of the posterior, a total of 2.5 x 10^ MCMC steps was found also necessary 
to satisfactorily estimate the posterior statistics, same as for the case with pre-assigned 
parameters. The chains of all KL coordinates and hyper-parameters were observed to be 
well-mixed as illustrated in Figure pTj 

The marginal posteriors of the first 8 KL coordinates pk for are shown in Figure [T^ 
together with their respective priors. The KLD values are also indicated on top of the plots. 
The results show a significant information gain for the first 7 KL coordinates, in contrast 
to only the first 4 KL coordinates when using pre-assigned parameters. In the same figure 
we show the marginal of the observation noise. The latter posterior has a MAP close to 
cTq = 0.01, corresponding to the value used to generate the observations. Similar conclusions 
can be made for the cases of 777 ,step,ran (j-gg^p^g ^ot shown for brevity). 

The pdfs of the posterior of the hyper-parameters are shown in Figures and compared 
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Figure 10: Comparison of the posterior of m{x) with the true profile, for the cases of and 

(from left to right). The inferences use a fixed Gaussian covariance function with I = 0.5 and ctj = 0.5. 
Shown are the median, 5% and 95% quantiles of the posterior and true profile. 



Iteration Iteration 


Figure 11: Illustration of the chain generated by MCMC using PC surrogate and coordinate transformation: 
successive samples of (Left) few KL coordinates and (Right) hyper-parameters q. Case of the inference of 


with their priors for m®'". The results show a significant difference between the prior and 
posterior of the covariance length scales I, with a MAP around I = 0.2, while the posterior 
probability of I > 0.4 is essentially zero. On the contrary, the posterior of the covariance 
variance aj has a similar structure to that of its prior, with a shift of the expected (and 
MAP) value toward higher values. 

5.3.1 Comparison with the inferences with and without hyper-parameters 

To better appreciate the improvement resulting from the introduction of the covariance 
hyper-parameters, we first provide a comparison of the inferred median profiles, obtained by 
inferring covariance hyper-parameters or by using pre-assigned values. The median profiles 
for all three cases are plotted in Figure [T^ which also depicts the true profiles. It is seen 
that introducing the covariance hyper-parameter significantly reduces the distance between 
the median and true profiles in the smooth cases {nf™ and while having no signifi¬ 
cant impact on the inference of the piecewise constant profile This behavior can be 

explained by the family of Gaussian processes considered, which is not well-suited for the 
inference of and so the introduction of hyper-parameters does not help improving the 

inference. 

Second, the median, mean, MAP, 5% and 95% quantiles of the inferred log-diffusivity 
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Figure 12: Comparison of the priors and (marginal) posteriors of the first 8 KL coordinates 7 ]^ and 

noise hyper-parameter (posterior only) for the inference of with covariance hyper-parameters. The 
corresponding Kullback-Leibler Divergences (KLD) for the KL coordinates are also indicated on top of each 
plot. 


profiles are plotted in Figure 1^ and compared with the respective true prohles 777,8™,step,ran 
These plots should be contrasted with the results shown in Figure obtained with pre¬ 
assigned covariance. Consistent with the previous observations on the medians, we observe 
that in the case of the discontinuous profile, the inference of the hyper-parameters only 

affects slightly the 5% and 95% quantiles. On the contrary, for the smooth profiles 7778 ™-^®-" the 
5% and 95% quantiles bounds now contain the true profiles for nearly every x. This significant 
improvement is due partly to the better agreement between the true and median profiles, but 
also to a generally higher variability in the posterior when considering the hyper-parameters 
in the inference process. In other words, the inference of the covariance hyper-parameters 
appears to yield a more flexible approach than when using a fixed covariance assumption. 


5.3.2 Effects of measurement noise and number of observations 

To investigate the impact of the observations on the inference processes, with or without 
covariance hyper-parameters, we repeat the previous inference problems for different noise 
level af in the observations and different number of spatial locations n^. The results are 
reported in Figure [l^ in terms of median profiles, for the three test profiles 7778 ™. 8 tep,ran 
left to right) and the inference without (top row) and with covariance hyper-parameters 
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Figure 13: Priors and (marginal) posteriors of the covariance hyper-parameters I (left plot) and aj (right 
plot) for the inference of Also indicated are the corresponding Kullback-Leibler Divergences (KLD). 





Figure 14: Comparison of the true log-diffusivity profiles with corresponding posterior medians for the 
inference with covariance hyper-parameters and preassigned covariance. Cases of and from 

left to right. 

(bottom row). As expected, the plots indicate an improvement of the inferred (median) 
profiles when the noise level is lowered, and when the number of observation increases. The 
improvements are more significant in the cases of the smooth profiles ( 777 , 8 ™,ran^ than for the 
discontinuous one (m®*®^), a result consistent with the previous observations. In addition, 
for the smooth cases, the improvements carried by the introduction of the covariance hyper¬ 
parameters in the inference problem is seen to not only yield median profiles closer to the 
true ones, but also to significantly accelerate the convergence to the truth. The improve¬ 
ment of the convergence rate would require additional numerical experiments to be precisely 
measured, but it can already be safely asserted that more information is gained from the 
observations when considering the covariance hyper-parameters in the inference. 

5.3.3 Convergence with the PC surrogate order 

Finally, we illustrate in Figure [T^ the dependence of the inferred median profiles on 
the selected order o for the PC surrogate model. The figure shows that, irrespective to 
the smoothness of the true profile, the inferred medians quickly converge as o increases, 
demonstrating that the L 2 convergence of the PC surrogate with coordinate transformation 
reported in Section [473| transfers to the inference problem. In fact, in view of the convergence 
curves shown in Figure the differences in the inferred median profiles for o = 8 and o = 10 
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Figure 15; Comparison of the posteriors profiles with the true ones, for the cases of m®‘", and 

(from left to right). The inferences use covariance function with hyper-parameters. Shown in each plots are 
the median, mean, MAP, 5% and 95% quantiles of the posterior and true profiles. 

are more likely to come from sampling errors than from differences in the PC surrogates. 

6 Discussion and Conclusion 

This paper presented a Bayesian approach to infer a parameter field from prior QV hav¬ 
ing a covariance function involving some hyper-parameters q. The main contribution of the 
present work is the introduction of a coordinate transformation in order to represent the 
prior QV using a unique reference basis of spatial modes, while the effects of the covariance 
hyper-parameters is reflected by the (joint) prior probability density function of the ran¬ 
dom coordinates of the QV that becomes conditioned on q. The coordinate transformation 
naturally leads to the construction of a unique polynomial surrogate for the forward model 
predictions; this surrogate model accounts for the dependence of the model predictions on 
the coordinates of the QV in the reference basis. For a Polynomial Chaos approximation, 
as considered in this paper, the construction of the surrogate amounts to solving a unique 
(stochastic) reference problem, assuming the independence of the QV coordinates. The 
stochastic dimensionality of the surrogate model is therefore equal to the dimensionality of 
the (truncated) QV representation, and is not augmented by the number of hyper-parameters 
intervening in the covariance function parametrization. This fact has to be contrasted with 
the alternative approaches proposed in [laiii! where the PC expansion explicitly incorpo¬ 
rates the dependencies on q. Another advantage of selecting a reference problem for the 
construction of the PC surrogate, compared to the direct expansion with respect to the co- 
variance hyper-parameters, is that it can overcome issues related to hyper-parameters with 
complex distributions, e.g. improper, no second-order moments, ... for which classical PC 
bases may not exist. 

The surrogate model can then be substituted for the true model predictions in the def¬ 
inition of the likelihood of the observations appearing in Bayes’ formula for the posterior 
of the QV coordinates and covariance hyper-parameters. The resulting approximate likeli¬ 
hood can in turn be imbedded in a MCMC sampler to greatly accelerate the sampling of 
the posterior distribution, with significant computational savings. In its present form, the 
proposed method however introduces some overhead during the sampling stage, compared 
to other approaches relying on PC acceleration with explicit dependence on q: for any new 
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Figure 16: Effect of observations number and noise. Shown are the medians of the inferred profiles for the 
three test cases 777,® step,ran right), and inferences for a pre-assigned covariance function (top 

row) or with hyper-parameters (bottom row). 


proposed values of the hyper-parameters the coordinate transformation must be determined. 
The determination of the transformation, given g, requiring the computation of the dom¬ 
inant subspace of the covariance function (given q) may constitute a severe limitation for 
large scale problems (for the simplified problems presented in Section the CPU time of 
the inference with hyper-parameters was found roughly three time as large as for the case 
without hyper-parameters). To remedy this point in the future, we plan to approximate the 
dependence of the coordinate transformation, on q using, again, a PC expansion. As 
for the construction of the PC surrogate of the model predictions, the approximation of the 
coordinate transformation will be computed off-line and subsequently used in-line within the 
sampler. 

The numerical experiments presented in the paper, although based on a simple model. 



Figure 17: Effect of PC order o on the inferred median of the posterior: cases of (Left) (Center) 

and (Right) 
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have highlighted the following points: 

• Using for reference basis the truncated set of dominant modes of the qf-averaged covari¬ 
ance function is not only optimal (on average) for the representation of the processes 
with variable q, but it also appears as the best choice in terms of averaged error for 
the PC surrogate of the model prediction in our example. 

• The control of the stretching induced by the coordinate mapping B{q) is crucial for the 

error control; while using the marginalized conditional density Pfjifllq) appears to be 
an appropriate choice, other alternatives may be conceived. In particular, augmenting 
the variability of the reference process could improve the robustness of the 

surrogate PC model. 

• The introduction of covariance functions with hyper-parameters clearly improved the 
inference results in the problems considered, particularly when inferring smooth pro¬ 
files. In particular, information gain was observed for a larger set of coordinates. In 
addition, when covariance hyper-parameters was accounted for, the convergence rate 
of the inferred field with increased number of observations and reduced observation 
noise also seemed to improve. 

• The convergence with the PC surrogate order seems quite fast for the presented prob¬ 
lems, suggesting to possibility of using moderate PC orders, particularly to balance 
PC error and posterior sampling errors. 

On the basis of the present findings, we plan for future work to develop the coordinate 
transformation approach to further exploit the posterior structure involving the conditional 
prior probability of the transformed coordinates f) and derive samplers adapted to this par¬ 
ticular structure. Regarding the construction of the PC surrogate model, consideration of 
adaptive constructions would be beneficial to reduce the computational cost of the off-line 
step, to increase accuracy, and further accelerate the sampler. Further, the PC approxi¬ 
mation of the coordinate transformation appears to be a key element to make the whole 
approach effective to handle large scale problems. In addition, the proposed method, in par¬ 
ticular the construction of the PC approximation of the model prediction, would certainly 
benefit from fitting the procedure to the posterior distributions (of coordinates and hyper¬ 
parameters) rather than to the prior ones, especially when the observations are informative. 
Since these posterior distributions are not known a priori, iterative constructions are needed. 
Pursuit of these avenues is currently considered on a complex problem arising in subsurface 
geological models and earthquake model. 
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